Splice site identification by idlBNs

نویسندگان

  • Robert Castelo
  • Roderic Guigó
چکیده

MOTIVATION Computational identification of functional sites in nucleotide sequences is at the core of many algorithms for the analysis of genomic data. This identification is based on the statistical parameters estimated from a training set. Often, because of the huge number of parameters, it is difficult to obtain consistent estimators. To simplify the estimation problem, one imposes independent assumptions between the nucleotides along the site. However, this can potentially limit the minimum value of the estimation error. RESULTS In this paper, we introduce a novel method in the context of identifying functional sites, that finds a reasonable set of independence assumptions supported by the data, among the nucleotides, and uses it to perform the identification of the sites by their likelihood ratio. More importantly, in many practical situations it is capable of improving its performance as the training sample size increases. We apply the method to the identification of splice sites, and further evaluate its effect within the context of exon and gene prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of a Novel Splice Site Mutation in RUNX2 Gene in a Family with Rare Autosomal Dominant Cleidocranial Dysplasia

Introduction: Pathogenic variants of RUNX2, a gene that encodes an osteoblast-specific transcription factor, have been shown as the cause of CCD, which is a rare hereditary skeletal and dental disorder with dominant mode of inheritance and a broad range of clinical variability. Due to the relative lack of clinical complications resulting in CCD, the medical diagnosis of this disorder is challen...

متن کامل

Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA.

Prediction of splice site selection and efficiency from sequence inspection is of fundamental interest (testing the current knowledge of requisite sequence features) and practical importance (genome annotation, design of mutant or transgenic organisms). In plants, the dominant variables affecting splice site selection and efficiency include the degree of matching to the extended splice site con...

متن کامل

Hidden Markov Model for Splicing Junction Sites Identification in DNA Sequences

Identification of coding sequence from genomic DNA sequence is the major step in pursuit of gene identification. In the eukaryotic organism, gene structure consists of promoter, intron, start codon, exons and stop codon, etc. and to identify it, accurate labeling of the mentioned segments is necessary. Splice site is the ‘separation’ between exons and introns, the predicted accuracy of which is...

متن کامل

Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification

Prediction of coding region from genomic DNA sequence is the foremost step in the quest of gene identification. In the eukaryotic organism, the gene structure consists of promoter, intron, start codon, exon and stop codon, etc. In the prediction of splice site, which is the separation between exons and introns, the accuracy is lower than 90% even when the sequences adjacent to the splice sites ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2004